In this notebook we are comparing the use of Alevin-fry and Spaceranger for quantifying spatial transcriptomics libraries. Two spatial transcriptomic libraries were quantified using Alevin-fry and Spaceranger and the results were combined following the Alevin-fry tutorial. We are comparing that to if we were to not integrate the Spaceranger data with Alevin-fry and only use Spaceranger. When performing this analysis all tools used an index with Ensembl 104. Here we will look at two libraries, SCPCR000372 and SCPCR000373. Alevin-fry was run both using the knee filtering and the unfiltered permit list mode. Included here is also use of Alevin-fry with the --sketch alignment mode and unfiltered permit list.
Note that SpatialExperiment was installed from Github, in order to reflect the most recent changes in read10XVisium at commit ddb15e0.
library(magrittr)
library(ggplot2)
library(SingleCellExperiment)
library(SpatialExperiment)
library(ggupset)
library(gridExtra)
Attaching package: ‘gridExtra’
The following object is masked from ‘package:Biobase’:
combine
The following object is masked from ‘package:BiocGenerics’:
combine
library(ggrepel)
library(clusterProfiler)
Registered S3 method overwritten by 'data.table':
method from
print.data.table
clusterProfiler v4.2.0 For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
If you use clusterProfiler in published research, please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141
Attaching package: ‘clusterProfiler’
The following object is masked from ‘package:IRanges’:
slice
The following object is masked from ‘package:S4Vectors’:
rename
The following object is masked from ‘package:stats’:
filter
library(org.Hs.eg.db)
Loading required package: AnnotationDbi
package ‘AnnotationDbi’ was built under R version 4.1.2
Attaching package: ‘AnnotationDbi’
The following object is masked from ‘package:clusterProfiler’:
select
# set seed for ORA
set.seed(2021)
# load in benchmarking functions that will be used for copying data and generating sample tables
function_path <- file.path(".." ,"benchmarking-functions", "R")
file.path(function_path, list.files(function_path, pattern = "*.R$")) %>%
purrr::walk(source)
# set up file paths
base_dir <- here::here()
# folder with alevin-fry and cellranger quants from S3
data_dir <- file.path(base_dir, "data", "spatial")
quants_dir <- file.path(data_dir, "data", "quants")
# results directory
results_dir <- file.path(data_dir, "results")
# sample name
sample_ids <- c("SCPCR000372", "SCPCR000373")
mito_file <- file.path(base_dir, "sample-info", "Homo_sapiens.GRCh38.103.mitogenes.txt")
# read in mito genes
mito_genes <- readr::read_tsv(mito_file, col_names = "gene_id")
Rows: 111 Columns: 1
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): gene_id
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mito_genes <- mito_genes %>%
dplyr::pull(gene_id) %>%
unique()
# download alevin fry and cellranger output
aws_copy_samples(local_dir = quants_dir,
s3_dir = "s3://nextflow-ccdl-results/scpca",
samples = sample_ids,
tools = c("alevin-fry-knee", "alevin-fry-unfiltered", "cellranger"))
Now let’s take a look at comparing the two methods of using Alevin-fry + Spaceranger to only Spaceranger for quantification. To do this, we will read in the Alevin-fry + Spaceranger combined and Spaceranger only SpatialExperiment objects separately and then merge them into one list before grabbing the per cell and per gene quality metrics.
# get path to fry knee output directory
fry_knee_dir <- file.path(quants_dir, "alevin-fry-knee", sample_ids)
fry_knee_dir <- paste0(fry_knee_dir, "-Homo_sapiens.GRCh38.104.spliced_intron.txome-salign-cr-like-em-knee")
# get path to fry unfiltered output directory
fry_unfiltered_dir <- file.path(quants_dir, "alevin-fry-unfiltered", sample_ids)
fry_unfiltered_dir <- paste0(fry_unfiltered_dir, "-Homo_sapiens.GRCh38.104.spliced_intron.txome-salign-cr-like-em")
# get path to fry unfiltered sketch output directory
fry_sketch_dir <- file.path(quants_dir, "alevin-fry-unfiltered", sample_ids)
fry_sketch_dir <- paste0(fry_sketch_dir, "-Homo_sapiens.GRCh38.104.spliced_intron.txome-sketch-cr-like-em")
# paths to spatial folders
cellranger_folders <- paste0(sample_ids, "-GRCh38_104_cellranger_full-spatial")
spaceranger_dir <- file.path(quants_dir, "cellranger", cellranger_folders)
# read in combined fry and spaceranger spe for fry knee
fry_knee_spe_1 <- create_fry_spaceranger_spe(fry_knee_dir[1],
spaceranger_dir[1],
sample_ids[1])
Rows: 4992 Columns: 6
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
fry_knee_spe_2 <- create_fry_spaceranger_spe(fry_knee_dir[2],
spaceranger_dir[2],
sample_ids[2])
Rows: 4992 Columns: 6
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
# read in combined fry and spaceranger spe for fry unfiltered
fry_unfiltered_spe_1 <- create_fry_spaceranger_spe(fry_unfiltered_dir[1],
spaceranger_dir[1],
sample_ids[1])
Rows: 4992 Columns: 6
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
fry_unfiltered_spe_2 <- create_fry_spaceranger_spe(fry_unfiltered_dir[2],
spaceranger_dir[2],
sample_ids[2])
Rows: 4992 Columns: 6
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
# read in combined fry and spaceranger spe for fry knee
fry_sketch_spe_1 <- create_fry_spaceranger_spe(fry_sketch_dir[1],
spaceranger_dir[1],
sample_ids[1])
Rows: 4992 Columns: 6
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
fry_sketch_spe_2 <- create_fry_spaceranger_spe(fry_sketch_dir[2],
spaceranger_dir[2],
sample_ids[2])
Rows: 4992 Columns: 6
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
# spaceranger output paths
spaceranger_dir <- file.path(quants_dir, "cellranger", cellranger_folders)
# read in spaceranger output directly using read10XVisium
spaceranger_spe_1 <- read10xVisium(file.path(spaceranger_dir[1], "outs"), sample_id = sample_ids[1])
spaceranger_spe_2 <- read10xVisium(file.path(spaceranger_dir[2], "outs"), sample_id = sample_ids[2])
Now that we have read in the data and created our two SpatialExperiment objects, we can go ahead and combine them into one list and then calculate the per spot QC metrics using scuttle::addPerCellQCMetrics().
# create one list with both spe's together
all_spe_list <- list(fry_knee_spe_1, fry_unfiltered_spe_1,
fry_sketch_spe_1, spaceranger_spe_1,
fry_knee_spe_2, fry_unfiltered_spe_2,
fry_sketch_spe_2, spaceranger_spe_2)
# name each spe with combination of sample_id-tool
spe_names <- c("SCPCR000372-alevin-fry-knee", "SCPCR000372-alevin-fry-unfiltered",
"SCPCR000372-alevin-fry-unfiltered-sketch", "SCPCR000372-spaceranger",
"SCPCR000373-alevin-fry-knee", "SCPCR000373-alevin-fry-unfiltered",
"SCPCR000373-alevin-fry-unfiltered-sketch", "SCPCR000373-spaceranger")
names(all_spe_list) <- spe_names
# calculate per cell QC and output to a combined data frame with plotting
all_spe_list <- all_spe_list %>%
purrr::map(
~ scuttle::addPerCellQCMetrics(.x,
subsets = list(mito = mito_genes[mito_genes %in% rownames(.x)])))
After adding in the per spot QC metrics to both of the spe’s, we want to extract the colData from each spe and create a data frame that we can use for plotting. We will also need some information about each sample and how it was run, so we will create a sample metadata table, sample_info_df that will then be merged with the colData.
# create sample info dataframe to be joined with per spot dataframe later
sample_info_df <- quant_info_table(data_dir= quants_dir,
tools = c("cellranger", "alevin-fry-knee", "alevin-fry-unfiltered"),
samples = sample_ids) %>%
# convert cellranger to spaceranger and paste filtering strategy to alevin-fry
dplyr::mutate(tool = ifelse(tool == "cellranger", "spaceranger", paste(tool, filter_strategy, sep = "-")),
tool = dplyr::case_when(alevin_alignment == "sketch" ~ paste(tool, alevin_alignment, sep = "-"),
alevin_alignment != "sketch" ~ tool))
sample_info_df
When we convert the colData to a data frame we use the custom function, spatial_coldata_to_df() to do so and apply it to each spe in our list.
fry_knee_names <- c("SCPCR000372-alevin-fry-knee", "SCPCR000373-alevin-fry-knee")
fry_unfiltered_names <- c("SCPCR000372-alevin-fry-unfiltered", "SCPCR000373-alevin-fry-unfiltered")
fry_unfiltered_sketch_names <- c("SCPCR000372-alevin-fry-unfiltered-sketch", "SCPCR000373-alevin-fry-unfiltered-sketch")
spaceranger_names <- c("SCPCR000372-spaceranger", "SCPCR000373-spaceranger")
# join coldata dataframe with sample info
coldata_df <- all_spe_list %>%
purrr::map_df(spatial_coldata_to_df, .id = "tool") %>%
# remove extra -1 from spaceranger barcodes
dplyr::mutate(spot_id = gsub("-1", "", spot_id),
# remove tool from sample id
sample_id = stringr::word(sample_id, 1, sep = "-"),
# remove sample id from tool
tool = dplyr::case_when(tool %in% fry_knee_names ~ "alevin-fry-knee",
tool %in% fry_unfiltered_names ~ "alevin-fry-unfiltered",
tool %in% fry_unfiltered_sketch_names ~ "alevin-fry-unfiltered-sketch",
tool %in% spaceranger_names ~ "spaceranger")) %>%
dplyr::left_join(sample_info_df,
by = c("tool", "sample_id" = "sample")) %>%
# remove spots that are not overlapping tissue
dplyr::filter(in_tissue == 1)
Now we only want to filter our data frame to contain spots that are shared between both tools.
# identify shared spots only
spot_counts <- coldata_df %>%
dplyr::count(spot_id, sample_id)
# how many spots are shared among the tools
spot_counts_plot <- coldata_df %>%
dplyr::group_by(spot_id, sample_id) %>%
dplyr::summarise(tools_detected = list(unique(tool)))
`summarise()` has grouped output by 'spot_id'. You can override using the `.groups` argument.
ggplot(spot_counts_plot, aes(x = tools_detected))+
geom_bar() +
scale_x_upset(n_intersections = 4)
For the most part, the majority of the spots identified are found in both Spaceranger alone and the combination with Alevin-fry-knee and Alevin-fry-unfiltered, with a small subset being identified in Spaceranger and Alevin-fry-unfiltered. It appears that using alevin-fry-unfiltered does give us some spots that using the knee method does not give us and we don’t see any loss of spots.
Let’s filter to only include these common spots.
common_spots <- spot_counts %>%
dplyr::filter(n == 4) %>%
dplyr::pull(spot_id)
coldata_df_common <- coldata_df %>%
dplyr::filter(spot_id %in% common_spots)
We will also need to filter the spe’s directly based on spots that are present in the tissue, so we create a small function to do this and then apply it to both spe’s in the list.
# we will also want to filter the spe's directly
filter_spe <- function(spe){
spe <- spe[, spatialData(spe)$in_tissue == 1]
}
all_spe_filter <- all_spe_list %>%
purrr::map(filter_spe)
When we look at our results, we will also want to visualize them so we will make a custom function to plot the results.
# custom function for plotting spe results and coloring by column of colData of choice
plot_spe <- function(spe, sample, column){
# plot spots only
p1 <- ggspavis::plotSpots(spe,
x_coord = "pxl_col_in_fullres",
y_coord = "pxl_row_in_fullres",
annotate = column) +
scale_color_viridis_c()
# plot with tissue underneath
p2 <- ggspavis::plotVisium(spe,
x_coord = "pxl_col_in_fullres",
y_coord = "pxl_row_in_fullres",
fill = column) +
scale_fill_viridis_c()
# arrange plots and add sample name as title
grid.arrange(p1, p2, nrow = 1, top = grid::textGrob(sample))
}
First we will look at the per cell metrics: mitochondrial reads per cell, total UMI per cell, and total genes detected per cell.
# % mitochondrial reads/ spot
ggplot(coldata_df_common, aes(x = tool, y = subsets_mito_percent, fill = tool)) +
geom_boxplot() +
facet_wrap(~ sample_id) +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("Mito Percent") +
xlab("")
all_spe_filter %>%
purrr::iwalk(plot_spe, column = "subsets_mito_percent")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing scale.
Overall it looks like mitochondrial content is low and fairly similar across both tools.
# total UMI/ spot
ggplot(coldata_df_common, aes(x = sum, color = tool)) +
geom_density() +
facet_wrap(~ sample_id) +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("UMI/spot") +
xlab("")
all_spe_filter %>%
purrr::iwalk(plot_spe, column = "sum")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing scale.
# total genes/ spot
ggplot(coldata_df_common, aes(x = detected, color = tool)) +
geom_density() +
facet_wrap(~ sample_id) +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("Genes detected/spot") +
xlab("")
all_spe_filter %>%
purrr::iwalk(plot_spe, column = "detected")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing scale.
Generally it looks like the tools are fairly similar, except that Alevin-fry shows a slight decrease in both total UMI/cell and genes detected/cell when compared to Spaceranger alone, which is seen in the spatial plot as well. Alevin-fry-knee and alevin-fry-unfiltered seem to almost completely overlap in terms of quantification with the only difference being a few spots that were not detected using the knee method are now identified in the unfiltered method. If using alevin-fry with --sketch we see that the UMI/cell and genes detected/cell almost completely overlap the counts observed in Spaceranger. This is also seen in the plots, as the plots visually look almost identical when using --sketch but not selective alignment.
Let’s also look at the correlation of mean gene expression across shared genes. We will first need to calculate the per feature QC on the filtered spes after removing spots not present in the tissue and then grab the rowData and combine into a data frame used for plotting.
all_spe_filter <- all_spe_filter %>%
purrr::map(scuttle::addPerFeatureQCMetrics)
# grab rowdata and combine with sample info
rowdata_df <- purrr::map_df(all_spe_filter, scpcaTools::rowdata_to_df, .id = "tool") %>%
# extract sample_id from tool and create a new column to avoid duplicates
dplyr::mutate(sample_id = stringr::word(tool, 1, sep = "-"),
tool = dplyr::case_when(tool %in% fry_knee_names ~ "alevin-fry-knee",
tool %in% fry_unfiltered_names ~ "alevin-fry-unfiltered",
tool %in% fry_unfiltered_sketch_names ~ "alevin-fry-unfiltered-sketch",
tool %in% spaceranger_names ~ "spaceranger")) %>%
dplyr::left_join(sample_info_df,
by = c("tool", "sample_id" = "sample"))
We then want to filter out any lowly detected genes, (detected < 5.0) and restrict our analysis to those genes that are found in both tools.
gene_counts <- rowdata_df %>%
# remove genes that have a low frequency of being detected
dplyr::filter(detected >= 5.0) %>%
dplyr::count(gene_id, sample_id)
# restrict to only common genes
common_genes <- gene_counts %>%
dplyr::filter(n == 4) %>%
dplyr::pull(gene_id)
rowdata_df_common <- rowdata_df %>%
dplyr::filter(gene_id %in% common_genes)
# create a table to calculate correlation between mean gene expression
rowdata_cor <- rowdata_df_common %>%
dplyr::select(tool, gene_id, sample_id, mean) %>%
# spread the mean expression stats to one column per caller
tidyr::pivot_wider(id_cols = c(gene_id, sample_id),
names_from = c("tool"),
values_from = mean) %>%
# drop rows with NA values to ease correlation calculation
tidyr::drop_na()
# look at correlation between the two tools
rowdata_cor %>%
dplyr::group_by(sample_id) %>%
dplyr::summarize(
alevin_fry_knee_spaceranger_cor = cor(`spaceranger`, `alevin-fry-knee`, method = "spearman"),
alevin_fry_unfiltered_spaceranger_cor= cor(`spaceranger`, `alevin-fry-unfiltered`, method = "spearman"),
alevin_fry_unfiltered_sketch_spaceranger_cor = cor(`spaceranger`, `alevin-fry-unfiltered-sketch`, method = "spearman")
)
# mean gene expression across shared genes
ggplot(rowdata_cor, aes(x = `spaceranger`, y = `alevin-fry-knee`)) +
geom_point(size = 0.5, alpha = 0.1) +
scale_x_log10() +
scale_y_log10() +
facet_wrap(~sample_id) +
geom_abline() +
labs(x = "Spaceranger mean gene expression", y = "Alevin Fry Knee Mean gene expression") +
theme_classic()
NA
ggplot(rowdata_cor, aes(x = `spaceranger`, y = `alevin-fry-unfiltered`)) +
geom_point(size = 0.5, alpha = 0.1) +
scale_x_log10() +
scale_y_log10() +
facet_wrap(~sample_id) +
geom_abline() +
labs(x = "Spaceranger mean gene expression", y = "Alevin Fry Unfiltered Mean gene expression") +
theme_classic()
ggplot(rowdata_cor, aes(x = `spaceranger`, y = `alevin-fry-unfiltered-sketch`)) +
geom_point(size = 0.5, alpha = 0.1) +
scale_x_log10() +
scale_y_log10() +
facet_wrap(~sample_id) +
geom_abline() +
labs(x = "Spaceranger mean gene expression", y = "Alevin Fry Unfiltered Sketch Mean gene expression") +
theme_classic()
Correlation appears to be quite high between mean gene expression in Spaceranger and Alevin-fry, however, we do see that generally genes have higher gene expression in Spaceranger than in Alevin-fry and are slightly off the diagonal.
In alevin-fry --sketch the genes line up along the x=y axis and appear to have less of an increase in expression with Spaceranger quantification, however the group of genes that are slightly off the diagonal is still present.
There is a subset of genes that seems to be slightly more affected and be further off the diagonal. Let’s take at those gene for each sample. It does appear to be less dramatic in SCPCR000373. We will only focus on without --sketch, as that is where we see the largest inmpact on gene expression differences.
# get the gene symbols and then join back with rowdata df
gene_symbols_df <- rowdata_df %>%
dplyr::select(gene_id, symbol)
# join correlation with gene symbols
rowdata_cor <- rowdata_cor %>%
dplyr::left_join(gene_symbols_df) %>%
dplyr::distinct() %>%
tidyr::drop_na() %>%
# add difference in mean gene expression between alevin-fry and spaceranger
dplyr::mutate(knee_log_fold_change = log(`alevin-fry-knee`/spaceranger),
unfiltered_log_fold_change = log(`alevin-fry-unfiltered`/`spaceranger`))
Joining, by = "gene_id"
Let’s see if we can specificall identify the group of genes that are off the diagonal by labeling them with a different color.
rowdata_cor <- rowdata_cor %>%
dplyr::mutate(knee_diff = ifelse(
knee_log_fold_change < - 0.75 &
spaceranger < 5 &
spaceranger > -1, "diff_expression", "equal_expression"),
unfiltered_diff = ifelse(
unfiltered_log_fold_change < - 0.75 &
spaceranger < 5 &
spaceranger > -1, "diff_expression", "equal_expression"))
ggplot(rowdata_cor, aes(x = `spaceranger`, y = `alevin-fry-knee`, color = knee_diff)) +
geom_point(size = 0.5, alpha = 0.1) +
scale_x_log10() +
scale_y_log10() +
geom_abline() +
facet_wrap(~ sample_id, nrow = 2) +
labs(x = "Spaceranger mean gene expression", y = "Alevin Fry Knee Mean gene expression") +
theme_classic()
ggplot(rowdata_cor, aes(x = `spaceranger`, y = `alevin-fry-unfiltered`, color = unfiltered_diff)) +
geom_point(size = 0.5, alpha = 0.1) +
scale_x_log10() +
scale_y_log10() +
geom_abline() +
facet_wrap(~ sample_id, nrow = 2) +
labs(x = "Spaceranger mean gene expression", y = "Alevin Fry Unfiltered Mean gene expression") +
theme_classic()
It looks like the genes that this separate group of genes have a log(fold change) < -0.75 and mean gene expression in spaceranger < 5 and > -1. Let’s get that list of genes in both alevin-fry-knee and alevin-fry-unfiltered.
knee_diff_gene_counts <- rowdata_cor %>%
dplyr::filter(knee_diff == "diff_expression") %>%
dplyr::count(gene_id) %>%
dplyr::filter(n == 2) %>%
dplyr::pull(gene_id)
knee_diff_genes <- rowdata_cor %>%
dplyr::filter(gene_id %in% knee_diff_gene_counts) %>%
dplyr::arrange(symbol)
knee_diff_genes
unfiltered_diff_gene_counts <- rowdata_cor %>%
dplyr::filter(unfiltered_diff == "diff_expression") %>%
dplyr::count(gene_id) %>%
dplyr::filter(n == 2) %>%
dplyr::pull(gene_id)
unfiltered_diff_genes <- rowdata_cor %>%
dplyr::filter(gene_id %in% unfiltered_diff_gene_counts) %>%
dplyr::arrange(symbol)
unfiltered_diff_genes
Let’s look at what types of genes are found to have different gene expression across these two tools using over representation analysis. First, let’s look specifically at that group of genes that is slightly off the diagonal (considering only genes found in both samples), then we will look at all genes with abs(log(fold change) > 0.5)
# unfiltered and knee outlier gene lists
knee_diff_genes <- knee_diff_genes$symbol %>%
unique()
unfiltered_diff_genes <- unfiltered_diff_genes$symbol %>%
unique()
# background gene list
background_genes <- rowdata_cor$symbol %>%
unique()
knee_go_ora_results <- enrichGO(gene = knee_outlier_genes,
universe = background_genes,
keyType = "SYMBOL",
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.00001)
unfiltered_go_ora_results <- enrichGO(gene = unfiltered_outlier_genes,
universe = background_genes,
keyType = "SYMBOL",
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.00001)
knee_go_results <- knee_go_ora_results@result %>%
as.data.frame() %>%
dplyr::filter(p.adjust < 0.2)
knee_go_results
unfiltered_go_results <- unfiltered_go_ora_results@result %>%
as.data.frame() %>%
dplyr::filter(p.adjust < 0.2)
unfiltered_go_results
In order to see any enrichment results, the adjusted p-value threshold has to be increased to 0.2 so there is very little confidence that these pathways are actually enriched. Therefore, it appears that there are no significant pathways that are affected by the group of genes that are found to be off the diagonal between spaceranger and alevin-fry.
Let’s take a look at all genes that have at least 0.5 log fold change between Spaceranger and Alevin-fry.
# find genes that have different gene expression in both samples between spaceranger and fry-knee or fry-unfiltered
different_gene_counts <- rowdata_cor %>%
dplyr::filter(abs(knee_log_fold_change) > 0.5 | abs(unfiltered_log_fold_change) > 0.5) %>%
dplyr::count(gene_id) %>%
dplyr::filter(n == 2) %>%
dplyr::pull(gene_id)
# print out list of genes with with > 1.5 gene expression in Alevin-fry
# only include genes that are found in both samples
rowdata_cor %>%
dplyr::filter(gene_id %in% different_gene_counts) %>%
dplyr::arrange(symbol) %>%
dplyr::select(sample_id, symbol, gene_id, knee_log_fold_change,
unfiltered_log_fold_change, `alevin-fry-knee`,
`alevin-fry-unfiltered`, spaceranger)
# extract target gene list for ORA, 886 genes
different_genes <- rowdata_cor %>%
# filter for anything with fold change > 1.5 and found in both samples
dplyr::filter(gene_id %in% different_gene_counts) %>%
dplyr::pull(symbol) %>%
unique()
We are only going to use one list of genes here to do over representation analysis, because the same genes are found to have different gene expression between Alevin-fry and spaceranger regardless of filtering strategy for Alevin-fry.
# perform gene ontology looking at all genes that are different
go_ora_results <- enrichGO(gene = different_genes,
universe = background_genes,
keyType = "SYMBOL",
OrgDb = org.Hs.eg.db,
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.00001)
# look at gene ontology results
go_results <- go_ora_results@result %>%
as.data.frame() %>%
dplyr::filter(p.adjust < 0.2)
go_results
It looks like when you filter to only include genes that are found to have different mean gene expression between Spaceranger and Alevin-fry in both samples there are no specific pathways identified that the genes belong to.
--sketch alignment with Alevin-fry.--sketch alignment and more genes that were quantified in Spaceranger and were not quantified in Alevin-fry with selective alignment are now quantified with --sketch.sessioninfo::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.1.1 (2021-08-10)
os macOS Catalina 10.15.7
system x86_64, darwin17.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Chicago
date 2021-12-13
rstudio 1.4.1106 Tiger Daylily (desktop)
pandoc 2.11.4 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown)
─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.1.0)
AnnotationDbi * 1.56.2 2021-11-09 [1] Bioconductor
ape 5.5 2021-04-25 [1] CRAN (R 4.1.0)
aplot 0.1.1 2021-09-22 [1] CRAN (R 4.1.0)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
beachmat 2.10.0 2021-10-26 [1] Bioconductor
Biobase * 2.54.0 2021-10-26 [1] Bioconductor
BiocGenerics * 0.40.0 2021-10-26 [1] Bioconductor
BiocIO 1.4.0 2021-10-26 [1] Bioconductor
BiocParallel 1.28.2 2021-11-25 [1] Bioconductor
Biostrings 2.62.0 2021-10-26 [1] Bioconductor
bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0)
bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0)
blob 1.2.2 2021-07-23 [1] CRAN (R 4.1.0)
bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.0)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0)
cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1)
cluster 2.1.2 2021-04-17 [1] CRAN (R 4.1.1)
clusterProfiler * 4.2.0 2021-10-26 [1] Bioconductor
codetools 0.2-18 2020-11-04 [1] CRAN (R 4.1.1)
colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0)
cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.1.0)
crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.0)
data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.0)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
DelayedArray 0.20.0 2021-10-26 [1] Bioconductor
DelayedMatrixStats 1.16.0 2021-10-26 [1] Bioconductor
deldir 1.0-6 2021-10-23 [1] CRAN (R 4.1.0)
digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.0)
DO.db 2.9 2021-11-30 [1] Bioconductor
DOSE 3.20.1 2021-11-18 [1] Bioconductor
downloader 0.4 2015-07-09 [1] CRAN (R 4.1.0)
dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
dqrng 0.3.0 2021-05-01 [1] CRAN (R 4.1.0)
DropletUtils 1.14.1 2021-11-08 [1] Bioconductor
edgeR 3.36.0 2021-10-26 [1] Bioconductor
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
enrichplot 1.14.1 2021-10-31 [1] Bioconductor
evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
fastmatch 1.1-3 2021-07-23 [1] CRAN (R 4.1.0)
fgsea 1.20.0 2021-10-26 [1] Bioconductor
fitdistrplus 1.1-6 2021-09-28 [1] CRAN (R 4.1.0)
future 1.23.0 2021-10-31 [1] CRAN (R 4.1.0)
future.apply 1.8.1 2021-08-10 [1] CRAN (R 4.1.0)
generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.0)
GenomeInfoDb * 1.30.0 2021-10-26 [1] Bioconductor
GenomeInfoDbData 1.2.7 2021-11-16 [1] Bioconductor
GenomicAlignments 1.30.0 2021-10-26 [1] Bioconductor
GenomicRanges * 1.46.1 2021-11-18 [1] Bioconductor
ggforce 0.3.3 2021-03-05 [1] CRAN (R 4.1.0)
ggfun 0.0.4 2021-09-17 [1] CRAN (R 4.1.0)
ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0)
ggplotify 0.1.0 2021-09-02 [1] CRAN (R 4.1.0)
ggraph 2.0.5 2021-02-23 [1] CRAN (R 4.1.0)
ggrepel * 0.9.1 2021-01-15 [1] CRAN (R 4.1.0)
ggridges 0.5.3 2021-01-08 [1] CRAN (R 4.1.0)
ggside 0.1.3 2021-10-24 [1] CRAN (R 4.1.0)
ggspavis 1.0.0 2021-10-26 [1] Bioconductor
ggtree 3.2.1 2021-11-16 [1] Bioconductor
ggupset * 0.3.0 2020-05-05 [1] CRAN (R 4.1.0)
globals 0.14.0 2020-11-22 [1] CRAN (R 4.1.0)
glue 1.5.1 2021-11-30 [1] CRAN (R 4.1.1)
GO.db 3.14.0 2021-11-30 [1] Bioconductor
goftest 1.2-3 2021-10-07 [1] CRAN (R 4.1.0)
GOSemSim 2.20.0 2021-10-26 [1] Bioconductor
graphlayouts 0.7.2 2021-11-21 [1] CRAN (R 4.1.0)
gridExtra * 2.3 2017-09-09 [1] CRAN (R 4.1.0)
gridGraphics 0.5-1 2020-12-13 [1] CRAN (R 4.1.0)
grr 0.9.5 2016-08-26 [1] CRAN (R 4.1.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0)
HDF5Array 1.22.1 2021-11-14 [1] Bioconductor
here 1.0.1 2020-12-13 [1] CRAN (R 4.1.0)
hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.0)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.0)
httpuv 1.6.3 2021-09-09 [1] CRAN (R 4.1.0)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0)
ica 1.0-2 2018-05-24 [1] CRAN (R 4.1.0)
igraph 1.2.9 2021-11-23 [1] CRAN (R 4.1.0)
IRanges * 2.28.0 2021-10-26 [1] Bioconductor
irlba 2.3.3 2019-02-05 [1] CRAN (R 4.1.0)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0)
KEGGREST 1.34.0 2021-10-26 [1] Bioconductor
KernSmooth 2.23-20 2021-05-03 [1] CRAN (R 4.1.1)
knitr 1.36 2021-09-29 [1] CRAN (R 4.1.0)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.0)
later 1.3.0 2021-08-18 [1] CRAN (R 4.1.0)
lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.0)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.1.0)
leiden 0.3.9 2021-07-27 [1] CRAN (R 4.1.0)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0)
limma 3.50.0 2021-10-26 [1] Bioconductor
listenv 0.8.0 2019-12-05 [1] CRAN (R 4.1.0)
lmtest 0.9-39 2021-11-07 [1] CRAN (R 4.1.0)
locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.1.0)
magick 2.7.3 2021-08-18 [1] CRAN (R 4.1.0)
magrittr * 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
MASS 7.3-54 2021-05-03 [1] CRAN (R 4.1.1)
Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.1)
Matrix.utils 0.9.8 2020-02-26 [1] CRAN (R 4.1.0)
MatrixGenerics * 1.6.0 2021-10-26 [1] Bioconductor
matrixStats * 0.61.0 2021-09-17 [1] CRAN (R 4.1.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.0)
mgcv 1.8-38 2021-10-06 [1] CRAN (R 4.1.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.1.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0)
nlme 3.1-153 2021-09-07 [1] CRAN (R 4.1.0)
org.Hs.eg.db * 3.14.0 2021-11-16 [1] Bioconductor
parallelly 1.29.0 2021-11-21 [1] CRAN (R 4.1.0)
patchwork 1.1.1 2020-12-17 [1] CRAN (R 4.1.0)
pbapply 1.5-0 2021-09-16 [1] CRAN (R 4.1.0)
pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
plotly 4.10.0 2021-10-09 [1] CRAN (R 4.1.0)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.0)
png 0.1-7 2013-12-03 [1] CRAN (R 4.1.0)
polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.1.0)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
qvalue 2.26.0 2021-10-26 [1] Bioconductor
R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
RANN 2.6.1 2019-01-08 [1] CRAN (R 4.1.0)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.1.0)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0)
RcppAnnoy 0.0.19 2021-07-30 [1] CRAN (R 4.1.0)
RCurl 1.98-1.5 2021-09-17 [1] CRAN (R 4.1.0)
readr 2.1.1 2021-11-30 [1] CRAN (R 4.1.1)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.1.0)
restfulr 0.0.13 2017-08-06 [1] CRAN (R 4.1.0)
reticulate 1.22 2021-09-17 [1] CRAN (R 4.1.0)
rhdf5 2.38.0 2021-10-26 [1] Bioconductor
rhdf5filters 1.6.0 2021-10-26 [1] Bioconductor
Rhdf5lib 1.16.0 2021-10-26 [1] Bioconductor
rjson 0.2.20 2018-06-08 [1] CRAN (R 4.1.0)
rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1)
rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0)
ROCR 1.0-11 2020-05-02 [1] CRAN (R 4.1.0)
rpart 4.1-15 2019-04-12 [1] CRAN (R 4.1.1)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0)
Rsamtools 2.10.0 2021-10-26 [1] Bioconductor
RSQLite 2.2.9 2021-12-06 [1] CRAN (R 4.1.1)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
rtracklayer 1.54.0 2021-10-26 [1] Bioconductor
Rtsne 0.15 2018-11-10 [1] CRAN (R 4.1.0)
S4Vectors * 0.32.3 2021-11-21 [1] Bioconductor
sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0)
scattermore 0.7 2020-11-24 [1] CRAN (R 4.1.0)
scatterpie 0.1.7 2021-08-20 [1] CRAN (R 4.1.0)
scpcaTools 0.1.2 2021-10-12 [1] Github (AlexsLemonade/scpcaTools@2cdad4c)
sctransform 0.3.2 2020-12-16 [1] CRAN (R 4.1.0)
scuttle 1.4.0 2021-10-26 [1] Bioconductor
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.1)
Seurat 4.0.5 2021-10-17 [1] CRAN (R 4.1.0)
SeuratObject 4.0.4 2021-11-23 [1] CRAN (R 4.1.0)
shadowtext 0.0.9 2021-09-19 [1] CRAN (R 4.1.0)
shiny 1.7.1 2021-10-02 [1] CRAN (R 4.1.0)
SingleCellExperiment * 1.16.0 2021-10-26 [1] Bioconductor
sparseMatrixStats 1.6.0 2021-10-26 [1] Bioconductor
SpatialExperiment * 1.3.6 2021-12-06 [1] Github (drighelli/SpatialExperiment@ddb15e0)
spatstat.core 2.3-2 2021-11-26 [1] CRAN (R 4.1.1)
spatstat.data 2.1-0 2021-03-21 [1] CRAN (R 4.1.0)
spatstat.geom 2.3-0 2021-10-09 [1] CRAN (R 4.1.0)
spatstat.sparse 2.0-0 2021-03-16 [1] CRAN (R 4.1.0)
spatstat.utils 2.2-0 2021-06-14 [1] CRAN (R 4.1.0)
stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.1)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
SummarizedExperiment * 1.24.0 2021-10-26 [1] Bioconductor
survival 3.2-13 2021-08-24 [1] CRAN (R 4.1.0)
tensor 1.5 2012-05-05 [1] CRAN (R 4.1.0)
tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0)
tidygraph 1.2.0 2020-05-12 [1] CRAN (R 4.1.0)
tidyr 1.1.4 2021-09-27 [1] CRAN (R 4.1.0)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
tidytree 0.3.6 2021-11-12 [1] CRAN (R 4.1.0)
tinytex 0.35 2021-11-04 [1] CRAN (R 4.1.0)
treeio 1.18.1 2021-11-14 [1] Bioconductor
tweenr 1.0.2 2021-03-23 [1] CRAN (R 4.1.0)
tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.1)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
uwot 0.1.11 2021-12-02 [1] CRAN (R 4.1.0)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
viridis 0.6.2 2021-10-13 [1] CRAN (R 4.1.0)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.1.0)
vroom 1.5.7 2021-11-30 [1] CRAN (R 4.1.1)
withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.1)
xfun 0.28 2021-11-04 [1] CRAN (R 4.1.0)
XML 3.99-0.8 2021-09-17 [1] CRAN (R 4.1.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.0)
XVector 0.34.0 2021-10-26 [1] Bioconductor
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
yulab.utils 0.0.4 2021-10-09 [1] CRAN (R 4.1.0)
zlibbioc 1.40.0 2021-10-26 [1] Bioconductor
zoo 1.8-9 2021-03-09 [1] CRAN (R 4.1.0)
[1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────